Cross-language Entity Linking Adapting to User’s Language Ability

نویسندگان

  • Jialiang Zhou
  • Fuminori Kimura
  • Akira Maeda
چکیده

In this paper, we propose a method to automatically discover valuable keyphrases in Japanese and link these keyphrases to related Chinese Wikipedia pages. The method that we propose has four stages. Firstly, we extract nouns from a Japanese document using a morphological analyzer and extract the candidates of keyphrases using a method called Top Consecutive Nouns Cohesion (TCNC) [1]. Then, we judge the degree of difficulty of the extracted keyphrases and tag them with different linguistic levels. Secondly, we translate extracted Japanese keyphrases into Chinese using a combination of three translation methods. Thirdly, we extract the corresponding Chinese articles of the translated keyphrases. Fourthly, we translate the original Japanese document into Chinese and make a vector of noun frequencies. Then, we calculate the cosine similarities of the translated original document and candidate Chinese Wikipedia articles. Finally, we create links from the Japanese keyphrases to the top-ranking Chinese Wikipedia articles. Keywords—Entity linking; keyphrase extraction; Wikipedia; Cross-language Link Discovery; linguistic difficulty level estimation

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Creating and Curating a Cross-Language Person-Entity Linking Collection

To stimulate research in cross-language entity linking, we present a new test collection for evaluating the accuracy of cross-language entity linking in twenty-one languages. This paper describes an efficient way to create and curate such a collection, judiciously exploiting existing language resources. Queries are created by semi-automatically identifying person names on the English side of a ...

متن کامل

Cross-Language Person-Entity Linking from Twenty Languages

The goal of entity linking is to associate references to some entity that are found in unstructured natural language content to an authoritative inventory of known entities. This paper describes the construction of six test collections for cross-language person-entity linking that together span 22 languages. Fully automated components were used together with two crowdsourced validation stages t...

متن کامل

Cross-Language Entity Linking

There has been substantial recent interest in aligning mentions of named-entities in unstructured texts to knowledge base descriptors, a task commonly called entity linking. This technology is crucial for applications in knowledge discovery and text data mining. This paper presents experiments in the new problem of crosslanguage entity linking, where documents and named entities are in a differ...

متن کامل

Building a Cross-Language Entity Linking Collection in Twenty-One Languages

We describe an efficient way to create a test collection for evaluating the accuracy of cross-language entity linking. Queries are created by semiautomatically identifying person names on the English side of a parallel corpus, using judgments obtained through crowdsourcing to identify the entity corresponding to the name, and projecting the English name onto the non-English document using word ...

متن کامل

Cross Lingual Entity Linking with Bilingual Topic Model

Cross lingual entity linking means linking an entity mention in a background source document in one language with the corresponding real world entity in a knowledge base written in the other language. The key problem is to measure the similarity score between the context of the entity mention and the document of the candidate entity. This paper presents a general framework for doing cross lingu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017